You can run Zinc to speed up the build of Spark. 

Cheers

> On Mar 27, 2016, at 10:15 PM, Tenghuan He <tenghua...@gmail.com> wrote:
> 
> Hi Ted
> I changed
> def customable(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
> to
> def customable(partitioner: Partitioner): MyRDD[K, V] = self.withScope {
> 
> after rebuilding the whole spark project(Since it takes long time, I didn't 
> do as you told at first), it also works.
> Thnaks
> 
>> On Mon, Mar 28, 2016 at 11:01 AM, Tenghuan He <tenghua...@gmail.com> wrote:
>> Thanks very much Ted
>> 
>> I added MyRDD.scala to the spark source code and rebuilt the whole spark 
>> project, using myrdd.asInstanceOf[MyRDD] doesn't work. It seems that MyRDD 
>> is not exposed to the spark-shell.
>> 
>> Finally I write a seperate spark application and add the MyRDD.scala to the 
>> project then the custom method can be called in the main function and it 
>> works.
>> I misunderstand the usage of custom rdd, the custom rdd does not have to be 
>> written to the spark project like UnionRDD, CogroupedRDD, and just add it to 
>> your own project.
>> 
>>> On Mon, Mar 28, 2016 at 4:28 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>> My interpretation is that variable myrdd is of type RDD to REPL, though it 
>>> was an instance of MyRDD.
>>> 
>>> Using asInstanceOf in spark-shell should allow you to call your custom 
>>> method.
>>> 
>>> Here is declaration of RDD:
>>> 
>>> abstract class RDD[T: ClassTag](
>>> 
>>> You can extend RDD and include your custom logic in the subclass.
>>> 
>>>> On Sun, Mar 27, 2016 at 10:14 AM, Tenghuan He <tenghua...@gmail.com> wrote:
>>>> ​Thanks Ted,
>>>> 
>>>> but I have a doubt that as the code ​above (line 4) in the spark-shell 
>>>> shows myrdd is already a MyRDD, does that not make sense?
>>>> 
>>>> 1 scala> val part = new org.apache.spark.HashPartitioner(10)
>>>> 2 scala> val baseRDD = sc.parallelize(1 to 100000).map(x => (x, 
>>>> "hello")).partitionBy(part).cache()
>>>> 3 scala> val myrdd = baseRDD.customable(part)  // here customable is a 
>>>> method added to the abstract RDD to create MyRDD
>>>> 4 myrdd: org.apache.spark.rdd.RDD[(Int, String)] = MyRDD[3] at customable 
>>>> at
>>>> 5 <console>:28
>>>> 6 scala> myrdd.customMethod(bulk)
>>>> 7 error: value customMethod is not a member of 
>>>> org.apache.spark.rdd.RDD[(Int, String)]
>>>> 
>>>>> On Mon, Mar 28, 2016 at 12:50 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>> bq.   def customable(partitioner: Partitioner): RDD[(K, V)] = 
>>>>> self.withScope {
>>>>> 
>>>>> In above, you declare return type as RDD. While you actually intended to 
>>>>> declare MyRDD as the return type.
>>>>> Or, you can cast myrdd as MyRDD in spark-shell.
>>>>> 
>>>>> BTW I don't think it is good practice to add custom method to base RDD.
>>>>> 
>>>>>> On Sun, Mar 27, 2016 at 9:44 AM, Tenghuan He <tenghua...@gmail.com> 
>>>>>> wrote:
>>>>>> Hi Ted,
>>>>>> 
>>>>>> The codes are running in spark-shell
>>>>>> 
>>>>>> scala> val part = new org.apache.spark.HashPartitioner(10)
>>>>>> scala> val baseRDD = sc.parallelize(1 to 100000).map(x => (x, 
>>>>>> "hello")).partitionBy(part).cache()
>>>>>> scala> val myrdd = baseRDD.customable(part)  // here customable is a 
>>>>>> method added to the abstract RDD to create MyRDD
>>>>>> myrdd: org.apache.spark.rdd.RDD[(Int, String)] = MyRDD[3] at customable 
>>>>>> at
>>>>>> <console>:28
>>>>>> scala> myrdd.customMethod(bulk)
>>>>>> error: value customMethod is not a member of 
>>>>>> org.apache.spark.rdd.RDD[(Int, String)]
>>>>>> 
>>>>>> and the customable method in PairRDDFunctions.scala is 
>>>>>> 
>>>>>>   def customable(partitioner: Partitioner): RDD[(K, V)] = self.withScope 
>>>>>> {
>>>>>>     new MyRDD[K, V](self, partitioner)
>>>>>>   }
>>>>>> 
>>>>>> Thanks:)
>>>>>> 
>>>>>>> On Mon, Mar 28, 2016 at 12:28 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>> Can you show the full stack trace (or top 10 lines) and the snippet 
>>>>>>> using your MyRDD ?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>>> On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He <tenghua...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>> ​Hi everyone,
>>>>>>>> 
>>>>>>>>     I am creating a custom RDD which extends RDD and add a custom 
>>>>>>>> method, however the custom method cannot be found.
>>>>>>>>     The custom RDD looks like the following:
>>>>>>>> 
>>>>>>>> class MyRDD[K, V](
>>>>>>>>     var base: RDD[(K, V)],
>>>>>>>>     part: Partitioner
>>>>>>>>   ) extends RDD[(K, V)](base.context, Nil) {
>>>>>>>> 
>>>>>>>>   def customMethod(bulk: ArrayBuffer[(K, (V, Int))]): myRDD[K, V] = {
>>>>>>>>   // ... custom code here
>>>>>>>>   }
>>>>>>>> 
>>>>>>>>   override def compute(split: Partition, context: TaskContext): 
>>>>>>>> Iterator[(K, V)] = {
>>>>>>>>   // ... custome code here
>>>>>>>>   }
>>>>>>>> 
>>>>>>>>   override protected def getPartitions: Array[Partition] = {
>>>>>>>>   // ... custom code here
>>>>>>>>   }
>>>>>>>> 
>>>>>>>>   override protected def getDependencies: Seq[Dependency[_]] = {
>>>>>>>>   // ... custom code here
>>>>>>>>   }
>>>>>>>> }​
>>>>>>>> 
>>>>>>>> In spark-shell, it turns out that the overrided methods works well, 
>>>>>>>> but when calling myrdd.customMethod(bulk), it throws out:
>>>>>>>> <console>:33: error: value customMethod is not a member of 
>>>>>>>> org.apache.spark.rdd.RDD[(In
>>>>>>>> t, String)]
>>>>>>>> 
>>>>>>>> Can anyone tell why the custom method can not be found?
>>>>>>>> Or do I have to add the customMethod to the abstract RDD and then 
>>>>>>>> override it in custom RDD?
>>>>>>>> 
>>>>>>>> PS: spark-version: 1.5.1
>>>>>>>> 
>>>>>>>> Thanks & Best regards
>>>>>>>> 
>>>>>>>> Tenghuan
> 

Reply via email to