Well, passing state between custom methods is trickier. But why don't you merge both methods into one and then no need to pass state.
-- Alexander aka Six-Hat-Thinker > On 27 Mar 2016, at 19:24, Tenghuan He <tenghua...@gmail.com> wrote: > > Hi Alexander, > Thanks for your reply > > In the custom rdd, there are some fields I have defined so that both custom > method and compute method can see and operate them, can the method in > implicit class implement that? > >> On Mon, Mar 28, 2016 at 1:09 AM, Alexander Krasnukhin >> <the.malk...@gmail.com> wrote: >> Extending breaks chaining and not nice. I think it is much better to write >> implicit class with extra methods. This way you add new methods without >> touching hierarchy at all i.e. >> >> object RddFunctions { >> implicit class RddFunctionsImplicit[T](rdd: RDD[T]) { >> /*** >> * Cache RDD and name it in one step. >> */ >> def cacheNamed(name: String) = { >> rdd.cache.setName(name) >> } >> } >> } >> >> ... >> >> import RddFunctions._ >> >> val rdd = ... >> rdd.cacheNamed("banana") >> >> ... >> >>> On Sun, Mar 27, 2016 at 6:50 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>> bq. def customable(partitioner: Partitioner): RDD[(K, V)] = >>> self.withScope { >>> >>> In above, you declare return type as RDD. While you actually intended to >>> declare MyRDD as the return type. >>> Or, you can cast myrdd as MyRDD in spark-shell. >>> >>> BTW I don't think it is good practice to add custom method to base RDD. >>> >>>> On Sun, Mar 27, 2016 at 9:44 AM, Tenghuan He <tenghua...@gmail.com> wrote: >>>> Hi Ted, >>>> >>>> The codes are running in spark-shell >>>> >>>> scala> val part = new org.apache.spark.HashPartitioner(10) >>>> scala> val baseRDD = sc.parallelize(1 to 100000).map(x => (x, >>>> "hello")).partitionBy(part).cache() >>>> scala> val myrdd = baseRDD.customable(part) // here customable is a >>>> method added to the abstract RDD to create MyRDD >>>> myrdd: org.apache.spark.rdd.RDD[(Int, String)] = MyRDD[3] at customable at >>>> <console>:28 >>>> scala> myrdd.customMethod(bulk) >>>> error: value customMethod is not a member of >>>> org.apache.spark.rdd.RDD[(Int, String)] >>>> >>>> and the customable method in PairRDDFunctions.scala is >>>> >>>> def customable(partitioner: Partitioner): RDD[(K, V)] = self.withScope { >>>> new MyRDD[K, V](self, partitioner) >>>> } >>>> >>>> Thanks:) >>>> >>>>> On Mon, Mar 28, 2016 at 12:28 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>> Can you show the full stack trace (or top 10 lines) and the snippet using >>>>> your MyRDD ? >>>>> >>>>> Thanks >>>>> >>>>>> On Sun, Mar 27, 2016 at 9:22 AM, Tenghuan He <tenghua...@gmail.com> >>>>>> wrote: >>>>>> Hi everyone, >>>>>> >>>>>> I am creating a custom RDD which extends RDD and add a custom >>>>>> method, however the custom method cannot be found. >>>>>> The custom RDD looks like the following: >>>>>> >>>>>> class MyRDD[K, V]( >>>>>> var base: RDD[(K, V)], >>>>>> part: Partitioner >>>>>> ) extends RDD[(K, V)](base.context, Nil) { >>>>>> >>>>>> def customMethod(bulk: ArrayBuffer[(K, (V, Int))]): myRDD[K, V] = { >>>>>> // ... custom code here >>>>>> } >>>>>> >>>>>> override def compute(split: Partition, context: TaskContext): >>>>>> Iterator[(K, V)] = { >>>>>> // ... custome code here >>>>>> } >>>>>> >>>>>> override protected def getPartitions: Array[Partition] = { >>>>>> // ... custom code here >>>>>> } >>>>>> >>>>>> override protected def getDependencies: Seq[Dependency[_]] = { >>>>>> // ... custom code here >>>>>> } >>>>>> } >>>>>> >>>>>> In spark-shell, it turns out that the overrided methods works well, but >>>>>> when calling myrdd.customMethod(bulk), it throws out: >>>>>> <console>:33: error: value customMethod is not a member of >>>>>> org.apache.spark.rdd.RDD[(In >>>>>> t, String)] >>>>>> >>>>>> Can anyone tell why the custom method can not be found? >>>>>> Or do I have to add the customMethod to the abstract RDD and then >>>>>> override it in custom RDD? >>>>>> >>>>>> PS: spark-version: 1.5.1 >>>>>> >>>>>> Thanks & Best regards >>>>>> >>>>>> Tenghuan >> >> >> >> -- >> Regards, >> Alexander >