RDD is immutable.

How about making class with a, b and c populated a base class ?
Class with e and f populated would be a subclass.

On Mon, Feb 15, 2016 at 11:55 PM, Hemalatha A <
hemalatha.amru...@googlemail.com> wrote:

> Hello,
>
> Yes Age was just for a illustration. Actual scenario is as below:
>
> Clas XYZ(a,b,c){
> val a =a
> val b = b
> val c =c
> var e:Int  = 0
> var f:Int  = 0
> var g:Int  = 0
> }
>
> Basically it is a Spark Streaming application, where in
>
>    1. Some fields of my class are coming directly coming from Streaming
>    input (a,b,c).  I create XYZ(a,b,c) and e,f,g will take default values as 0
>
>        2. Some fields will be calculated after some initial
> transformations(e and f based on a,b,c).
>
>  I update the Rdd like:
>
> newRdd =  rdd.map(obj=> {
> if(needsCalculation){
>    obj.e = calculateE(a,b,c)
>    obj.f = calculateF(a,b,c)
> }
> obj
> )
>
>       3.  And, based on these calculations, some other fields of the class
> should get calculated in next transformation.
>
> finalRdd =  rdd.map(obj=> {
> if(needsCalculation){
>    obj.g = calculateE(e,f)
> }
> obj
> )
>
>
> So I created 1 class with all variables, and then trying to update fields
> of the same class.
>
> On Tue, Feb 16, 2016 at 11:38 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Age can be computed from the birthdate.
>> Looks like it doesn't need to be a member of Animal class.
>>
>> If age is just for illustration, can you give an example which better
>> mimics the scenario you work on ?
>>
>> Cheers
>>
>> On Mon, Feb 15, 2016 at 8:53 PM, Hemalatha A <
>> hemalatha.amru...@googlemail.com> wrote:
>>
>>> Hello,
>>>
>>> I want to know what are the cons and performance impacts of using a var
>>> inside class object in a Rdd.
>>>
>>>
>>> Here is a example:
>>>
>>> Animal is a huge class with n number of val type variables (approx >600
>>> variables), but frequently, we will have to update Age(just 1 variable)
>>> after some computation. What is the best way to do it?
>>>
>>> Class Animal(age: Int, name; String) = {
>>>  var animalAge:Int  = age
>>>  val animalName:String  = name
>>> val ......
>>> }
>>>
>>>
>>> val animalRdd = sc.parallelize(List(Animal(1,"XYZ"), Animal(2,"ABC") ))
>>> ...
>>> ...
>>> animalRdd.map(ani=>{
>>>      if(ani.yearChange()) ani.animalAge+=1
>>>      ani
>>> })
>>>
>>>
>>> Is it advisable to use var in this case? Or can I do
>>> ani.copy(animalAge=2) which will reallocate the memory altogether for the
>>> animal. Please advice which is the best way to handle such cases.
>>>
>>>
>>>
>>> Regards
>>> Hemalatha
>>>
>>
>>
>
>
> --
>
>
> Regards
> Hemalatha
>

Reply via email to