Hi Harsh,
Probably you need to maintain some state for your values, as you are
updating some of the keys in a batch and check for a global state of your
equation.
Can you check the API mapWithState of DStream ?
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)
https://in.linkedi
Hi
Please help.
On Sat, 7 May 2016, 11:43 p.m. HARSH TAKKAR, wrote:
> Hi Ted
>
> Following is my use case.
>
> I have a prediction algorithm where i need to update some records to
> predict the target.
>
> For eg.
> I have an eq. Y= mX +c
> I need to change value of Xi of some records and calc
Hi Ted
Following is my use case.
I have a prediction algorithm where i need to update some records to
predict the target.
For eg.
I have an eq. Y= mX +c
I need to change value of Xi of some records and calculate sum(Yi) if the
value of prediction is not close to target value then repeat the pro
Using RDDs requires some 'low level' optimization techniques.
While using dataframes / Spark SQL allows you to leverage existing code.
If you can share some more of your use case, that would help other people
provide suggestions.
Thanks
> On May 6, 2016, at 6:57 PM, HARSH TAKKAR wrote:
>
>
Hi Ted
I am aware that rdd are immutable, but in my use case i need to update same
data set after each iteration.
Following are the points which i was exploring.
1. Generating rdd in each iteration.( It might use a lot of memory).
2. Using Hive tables and update the same table after each iterat
Please see the doc at the beginning of RDD class:
* A Resilient Distributed Dataset (RDD), the basic abstraction in Spark.
Represents an immutable,
* partitioned collection of elements that can be operated on in parallel.
This class contains the
* basic operations available on all RDDs, such as