Re: Custom RDD in spark, cannot find custom method

Alexander Krasnukhin Sun, 27 Mar 2016 11:17:02 -0700

Well, passing state between custom methods is trickier. But why don't you merge 
both methods into one and then no need to pass state.


--
Alexander
aka Six-Hat-Thinker

> On 27 Mar 2016, at 19:24, Tenghuan He <tenghua...@gmail.com> wrote:
> 
> Hi Alexander,
> Thanks for your reply
> 
> In the custom rdd, there are some fields I have defined so that both custom 
> method and compute method can see and operate them, can the method in 
> implicit class implement that?
> 
>> On Mon, Mar 28, 2016 at 1:09 AM, Alexander Krasnukhin 
>> <the.malk...@gmail.com> wrote:
>> Extending breaks chaining and not nice. I think it is much better to write 
>> implicit class with extra methods. This way you add new methods without 
>> touching hierarchy at all i.e.
>> 
>> object RddFunctions {
>>   implicit class RddFunctionsImplicit[T](rdd: RDD[T]) {
>>     /***
>>      * Cache RDD and name it in one step.
>>      */
>>     def cacheNamed(name: String) = {
>>       rdd.cache.setName(name)
>>     }
>>   }
>> }
>> 
>> ...
>> 
>> import RddFunctions._
>> 
>> val rdd = ...
>> rdd.cacheNamed("banana")
>> 
>> ...
>> 
>>> On Sun, Mar 27, 2016 at 6:50 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>> bq.   def customable(partitioner: Partitioner): RDD[(K, V)] = 
>>> self.withScope {
>>> 
>>> In above, you declare return type as RDD. While you actually intended to 
>>> declare MyRDD as the return type.
>>> Or, you can cast myrdd as MyRDD in spark-shell.
>>> 
>>> BTW I don't think it is good practice to add custom method to base RDD.
>>> 
>>>> On Sun, Mar 27, 2016 at 9:44 AM, Tenghuan He <tenghua...@gmail.com> wrote:
>>>> Hi Ted,
>>>> 
>>>> The codes are running in spark-shell
>>>> 
>>>> scala> val part = new org.apache.spark.HashPartitioner(10)
>>>> scala> val baseRDD = sc.parallelize(1 to 100000).map(x => (x, 
>>>> "hello")).partitionBy(part).cache()
>>>> scala> val myrdd = baseRDD.customable(part)  // here customable is a 
>>>> method added to the abstract RDD to create MyRDD
>>>> myrdd: org.apache.spark.rdd.RDD[(Int, String)] = MyRDD[3] at customable at
>>>> <console>:28
>>>> scala> myrdd.customMethod(bulk)
>>>> error: value customMethod is not a member of 
>>>> org.apache.spark.rdd.RDD[(Int, String)]
>>>> 
>>>> and the customable method in PairRDDFunctions.scala is 
>>>> 
>>>>   def customable(partitioner: Partitioner): RDD[(K, V)] = self.withScope {
>>>>     new MyRDD[K, V](self, partitioner)
>>>>   }
>>>> 
>>>> Thanks:)
>>>> 
>>>>> On Mon, Mar 28, 2016 at 12:28 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>> Can you show the full stack trace (or top 10 lines) and the snippet using 
>>>>> your MyRDD ?
>>>>> 
>>>>> Thanks
>>>>> 
>>>>>> On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He <tenghua...@gmail.com> 
>>>>>> wrote:
>>>>>> Hi everyone,
>>>>>> 
>>>>>>     I am creating a custom RDD which extends RDD and add a custom 
>>>>>> method, however the custom method cannot be found.
>>>>>>     The custom RDD looks like the following:
>>>>>> 
>>>>>> class MyRDD[K, V](
>>>>>>     var base: RDD[(K, V)],
>>>>>>     part: Partitioner
>>>>>>   ) extends RDD[(K, V)](base.context, Nil) {
>>>>>> 
>>>>>>   def customMethod(bulk: ArrayBuffer[(K, (V, Int))]): myRDD[K, V] = {
>>>>>>   // ... custom code here
>>>>>>   }
>>>>>> 
>>>>>>   override def compute(split: Partition, context: TaskContext): 
>>>>>> Iterator[(K, V)] = {
>>>>>>   // ... custome code here
>>>>>>   }
>>>>>> 
>>>>>>   override protected def getPartitions: Array[Partition] = {
>>>>>>   // ... custom code here
>>>>>>   }
>>>>>> 
>>>>>>   override protected def getDependencies: Seq[Dependency[_]] = {
>>>>>>   // ... custom code here
>>>>>>   }
>>>>>> }
>>>>>> 
>>>>>> In spark-shell, it turns out that the overrided methods works well, but 
>>>>>> when calling myrdd.customMethod(bulk), it throws out:
>>>>>> <console>:33: error: value customMethod is not a member of 
>>>>>> org.apache.spark.rdd.RDD[(In
>>>>>> t, String)]
>>>>>> 
>>>>>> Can anyone tell why the custom method can not be found?
>>>>>> Or do I have to add the customMethod to the abstract RDD and then 
>>>>>> override it in custom RDD?
>>>>>> 
>>>>>> PS: spark-version: 1.5.1
>>>>>> 
>>>>>> Thanks & Best regards
>>>>>> 
>>>>>> Tenghuan
>> 
>> 
>> 
>> -- 
>> Regards,
>> Alexander
>

Re: Custom RDD in spark, cannot find custom method

Reply via email to