RDD is immutable. How about making class with a, b and c populated a base class ? Class with e and f populated would be a subclass.
On Mon, Feb 15, 2016 at 11:55 PM, Hemalatha A < hemalatha.amru...@googlemail.com> wrote: > Hello, > > Yes Age was just for a illustration. Actual scenario is as below: > > Clas XYZ(a,b,c){ > val a =a > val b = b > val c =c > var e:Int = 0 > var f:Int = 0 > var g:Int = 0 > } > > Basically it is a Spark Streaming application, where in > > 1. Some fields of my class are coming directly coming from Streaming > input (a,b,c). I create XYZ(a,b,c) and e,f,g will take default values as 0 > > 2. Some fields will be calculated after some initial > transformations(e and f based on a,b,c). > > I update the Rdd like: > > newRdd = rdd.map(obj=> { > if(needsCalculation){ > obj.e = calculateE(a,b,c) > obj.f = calculateF(a,b,c) > } > obj > ) > > 3. And, based on these calculations, some other fields of the class > should get calculated in next transformation. > > finalRdd = rdd.map(obj=> { > if(needsCalculation){ > obj.g = calculateE(e,f) > } > obj > ) > > > So I created 1 class with all variables, and then trying to update fields > of the same class. > > On Tue, Feb 16, 2016 at 11:38 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Age can be computed from the birthdate. >> Looks like it doesn't need to be a member of Animal class. >> >> If age is just for illustration, can you give an example which better >> mimics the scenario you work on ? >> >> Cheers >> >> On Mon, Feb 15, 2016 at 8:53 PM, Hemalatha A < >> hemalatha.amru...@googlemail.com> wrote: >> >>> Hello, >>> >>> I want to know what are the cons and performance impacts of using a var >>> inside class object in a Rdd. >>> >>> >>> Here is a example: >>> >>> Animal is a huge class with n number of val type variables (approx >600 >>> variables), but frequently, we will have to update Age(just 1 variable) >>> after some computation. What is the best way to do it? >>> >>> Class Animal(age: Int, name; String) = { >>> var animalAge:Int = age >>> val animalName:String = name >>> val ...... >>> } >>> >>> >>> val animalRdd = sc.parallelize(List(Animal(1,"XYZ"), Animal(2,"ABC") )) >>> ... >>> ... >>> animalRdd.map(ani=>{ >>> if(ani.yearChange()) ani.animalAge+=1 >>> ani >>> }) >>> >>> >>> Is it advisable to use var in this case? Or can I do >>> ani.copy(animalAge=2) which will reallocate the memory altogether for the >>> animal. Please advice which is the best way to handle such cases. >>> >>> >>> >>> Regards >>> Hemalatha >>> >> >> > > > -- > > > Regards > Hemalatha >