see also:
https://github.com/apache/spark/pull/6848
On Mon, Jun 29, 2015 at 12:48 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.split.maxsize",
> "67108864")
>
> sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith("_")).get
> + "/*", classO
sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.split.maxsize",
"67108864")
sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith("_")).get
+ "/*", classOf[Text], classOf[Text])
works
On Sun, Jun 28, 2015 at 9:46 PM, Ted Yu wrote:
> There isn't setter for sc.hadoopConf
There isn't setter for sc.hadoopConfiguration
You can directly change value of parameter in sc.hadoopConfiguration
However, see the note in scaladoc:
* '''Note:''' As it will be reused in all Hadoop RDDs, it's better not
to modify it unless you
* plan to set some global configurations for al
val hadoopConf = new Configuration(sc.hadoopConfiguration)
hadoopConf.set("mapreduce.input.fileinputformat.split.maxsize",
"67108864")
sc.hadoopConfiguration(hadoopConf)
or
sc.hadoopConfiguration = hadoopConf
threw error.
On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu wrote:
> seq
sequenceFile() calls hadoopFile() where:
val confBroadcast = broadcast(new
SerializableConfiguration(hadoopConfiguration))
You can set the parameter in sc.hadoopConfiguration before calling
sc.sequenceFile().
Cheers
On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> I can do this
>
>