Hi Till, Thank you for your answer, I have couple of questions

1) Setting parallelism on a single map function in local is fine but on
distributed will it work as local execution?

2) Is there any other way apart from setting parallelism? Like spark
aggregate function?

3) Is it necessary that after transformations to call execute function? Or
Execution starts as soon as it encounters a action (Similar to Spark)?

4) Can I create a global execution environment (Either local or
distributed) for different Flink program in a module?

5) How to make the records come in sequence for a map or any other operator?


Regards,
Ravikumar


On 8 June 2016 at 21:14, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Ravikumar,
>
> Flink's operators are stateful. So you can simply create a variable in
> your mapper to keep the state around. But every mapper instance will have
> it's own state. This state is determined by the records which are sent to
> this mapper instance. If you need a global state, then you have to set the
> parallelism to 1.
>
> Cheers,
> Till
>
> On Wed, Jun 8, 2016 at 5:08 PM, Ravikumar Hawaldar <
> ravikumar.hawal...@gmail.com> wrote:
>
>> Hello,
>>
>> I have an DataSet<UserDefinedType> which is roughly a record in a DataSet
>> Or a file.
>>
>> Now I am using map transformation on this DataSet to compute a variable
>> (coefficients of linear regression parameters and data structure used is a
>> double[]).
>>
>> Now the issue is that, per record the variable will get updated and I am
>> struggling to maintain state of this variable for the next record.
>>
>> In simple, for first record the variable values will be 0.0, and after
>> first record the variable will get updated and I have to pass this updated
>> variable for the second record and so on for all records in DataSet.
>>
>> Any suggestions on how to maintain state of a variable?
>>
>>
>> Regards,
>> Ravikumar
>>
>
>

Reply via email to