Re: reading custom input format in Spark

2014-04-08 Thread Andrew Ash
Anurag, There is another method called newAPIHadoopRDD that takes in a Configuration object rather than a path. Give that a shot? https://spark.apache.org/docs/latest/api/core/index.html#org.apache.spark.SparkContext On Tue, Apr 8, 2014 at 1:47 PM, Anurag wrote: > andrew - yes, i am using th

Re: reading custom input format in Spark

2014-04-08 Thread Anurag
andrew/nick, thx for the input, got it to work: sc.hadoopConfiguration.set("record.delimiter.regex", "^[A-Za-z]{3},\\s\\d{2}\\s[A-Za-z]{3}.*") :-) -anurag On Tue, Apr 8, 2014 at 1:47 PM, Anurag wrote: > andrew - yes, i am using the PatternInputFormat from the blog post you > referenced. > I

Re: reading custom input format in Spark

2014-04-08 Thread Anurag
andrew - yes, i am using the PatternInputFormat from the blog post you referenced. I know how to set the pattern in configuration while writing a MR job, how do i do that from a spark shell? -anurag On Tue, Apr 8, 2014 at 1:41 PM, Andrew Ash wrote: > Are you using the PatternInputFormat from

Re: reading custom input format in Spark

2014-04-08 Thread Nick Pentreath
Seems like you need to initialise a regex pattern for that inputformat. How is this done? Perhaps via a config option? In which case you need to first create a hadoop configuration, set the appropriate config option for the regex, and pass that into newAPIHadoopFile. On Tue, Apr 8, 2014 at 10:36

Re: reading custom input format in Spark

2014-04-08 Thread Andrew Ash
Are you using the PatternInputFormat from this blog post? https://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/ If so you need to set the pattern in the configuration before attempting to read data with that InputFormat: String regex = "^[A-Za-