Re: Fine control with sc.sequenceFile

2015-06-29 Thread Koert Kuipers
see also: https://github.com/apache/spark/pull/6848 On Mon, Jun 29, 2015 at 12:48 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.split.maxsize", > "67108864") > > sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith("_")).get > + "/*", classO

Re: Fine control with sc.sequenceFile

2015-06-28 Thread ๏̯͡๏
sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.split.maxsize", "67108864") sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith("_")).get + "/*", classOf[Text], classOf[Text]) works On Sun, Jun 28, 2015 at 9:46 PM, Ted Yu wrote: > There isn't setter for sc.hadoopConf

Re: Fine control with sc.sequenceFile

2015-06-28 Thread Ted Yu
There isn't setter for sc.hadoopConfiguration You can directly change value of parameter in sc.hadoopConfiguration However, see the note in scaladoc: * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you * plan to set some global configurations for al

Re: Fine control with sc.sequenceFile

2015-06-28 Thread ๏̯͡๏
val hadoopConf = new Configuration(sc.hadoopConfiguration) hadoopConf.set("mapreduce.input.fileinputformat.split.maxsize", "67108864") sc.hadoopConfiguration(hadoopConf) or sc.hadoopConfiguration = hadoopConf threw error. On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu wrote: > seq

Re: Fine control with sc.sequenceFile

2015-06-28 Thread Ted Yu
sequenceFile() calls hadoopFile() where: val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration)) You can set the parameter in sc.hadoopConfiguration before calling sc.sequenceFile(). Cheers On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > I can do this > >