subject:"Re\: Updating Values Inside Foreach Rdd loop"

Re: Updating Values Inside Foreach Rdd loop

2016-05-10 Thread Rishi Mishra

Hi Harsh, Probably you need to maintain some state for your values, as you are updating some of the keys in a batch and check for a global state of your equation. Can you check the API mapWithState of DStream ? Regards, Rishitesh Mishra, SnappyData . (http://www.snappydata.io/) https://in.linkedi

Re: Updating Values Inside Foreach Rdd loop

2016-05-09 Thread HARSH TAKKAR

Hi Please help. On Sat, 7 May 2016, 11:43 p.m. HARSH TAKKAR, wrote: > Hi Ted > > Following is my use case. > > I have a prediction algorithm where i need to update some records to > predict the target. > > For eg. > I have an eq. Y= mX +c > I need to change value of Xi of some records and calc

Re: Updating Values Inside Foreach Rdd loop

2016-05-07 Thread HARSH TAKKAR

Hi Ted Following is my use case. I have a prediction algorithm where i need to update some records to predict the target. For eg. I have an eq. Y= mX +c I need to change value of Xi of some records and calculate sum(Yi) if the value of prediction is not close to target value then repeat the pro

Re: Updating Values Inside Foreach Rdd loop

2016-05-06 Thread Ted Yu

Using RDDs requires some 'low level' optimization techniques. While using dataframes / Spark SQL allows you to leverage existing code. If you can share some more of your use case, that would help other people provide suggestions. Thanks > On May 6, 2016, at 6:57 PM, HARSH TAKKAR wrote: > >

Re: Updating Values Inside Foreach Rdd loop

2016-05-06 Thread HARSH TAKKAR

Hi Ted I am aware that rdd are immutable, but in my use case i need to update same data set after each iteration. Following are the points which i was exploring. 1. Generating rdd in each iteration.( It might use a lot of memory). 2. Using Hive tables and update the same table after each iterat

Re: Updating Values Inside Foreach Rdd loop

2016-05-06 Thread Ted Yu

Please see the doc at the beginning of RDD class: * A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, * partitioned collection of elements that can be operated on in parallel. This class contains the * basic operations available on all RDDs, such as

Re: Updating Values Inside Foreach Rdd loop

Re: Updating Values Inside Foreach Rdd loop

Re: Updating Values Inside Foreach Rdd loop

Re: Updating Values Inside Foreach Rdd loop

Re: Updating Values Inside Foreach Rdd loop

Re: Updating Values Inside Foreach Rdd loop

6 matches

Site Navigation

Mail list logo

Footer information