Re: serialization issue with mapPartitions

2014-12-26 Thread Akhil
You cannot pass your jobConf object inside any of the transformation function in spark (like map, mapPartitions, etc.) since org.apache.hadoop.mapreduce.Job is not Serializable. You can use KryoSerializer (See this doc http://spark.apache.org/docs/latest/tuning.html#data-serialization), We usuall

RE: serialization issue with mapPartitions

2014-12-25 Thread Shao, Saisai
within closure. You can refer to org.apache.spark.rdd.HadoopRDD, there is a similar usage scenario like yours. Thanks Jerry. From: Tobias Pfeiffer [mailto:t...@preferred.jp] Sent: Friday, December 26, 2014 9:38 AM To: ey-chih chow Cc: user Subject: Re: serialization issue with mapPartitions Hi

Re: serialization issue with mapPartitions

2014-12-25 Thread Tobias Pfeiffer
Hi, On Fri, Dec 26, 2014 at 10:13 AM, ey-chih chow wrote: > I should rephrase my question as follows: > > How to use the corresponding Hadoop Configuration of a HadoopRDD in > defining > a function as an input parameter to the MapPartitions function? > Well, you could try to pull the `val confi

Re: serialization issue with mapPartitions

2014-12-25 Thread ey-chih chow
I should rephrase my question as follows: How to use the corresponding Hadoop Configuration of a HadoopRDD in defining a function as an input parameter to the MapPartitions function? Thanks. Ey-Chih Chow -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ser

Re: serialization issue with mapPartitions

2014-12-25 Thread Tobias Pfeiffer
Hi, On Fri, Dec 26, 2014 at 1:32 AM, ey-chih chow wrote: > > I got some issues with mapPartitions with the following piece of code: > > val sessions = sc > .newAPIHadoopFile( > "... path to an avro file ...", > classOf[org.apache.avro.mapreduce.AvroKeyInputFormat[ByteBuf