Hi Ravikumar,

I'll try to answer your questions:
1) If you set the parallelism of a map function to 1, there will be only a
single instance of that function regardless whether it is execution locally
or remotely in a cluster.
2) Flink does also support aggregations, (reduce, groupReduce, combine,
...). However, I do not see how this would help with a stateful map
function.
3) In Flink DataSet programs you usually construct the complete program and
call execute() after you have defined your sinks. There are two exceptions:
print() and collect() which both add special sinks and immediately execute
your program. print() prints the result to the stdout of the submitting
client and collect() fetches a dataset as collection.
4) I am not sure I understood your question. When you obtain an
ExecutionEnvironment with ExecutionEnvironment.getExecutionEnvrionment()
the type of the returned environment depends on the context in which the
program was executed. It can be a local environment if it is executed from
within an IDE or a RemodeExecutionEnvironment if the program is executed
via the CLI client and shipped to a remote cluster.
5) A map operator processes records one after the other, i.e., as a
sequence. If you need a certain order, you can call DataSet.sortPartition()
to locally sort the partition.

Hope that helps,
Fabian

2016-06-09 12:23 GMT+02:00 Ravikumar Hawaldar <ravikumar.hawal...@gmail.com>
:

> Hi Till, Thank you for your answer, I have couple of questions
>
> 1) Setting parallelism on a single map function in local is fine but on
> distributed will it work as local execution?
>
> 2) Is there any other way apart from setting parallelism? Like spark
> aggregate function?
>
> 3) Is it necessary that after transformations to call execute function? Or
> Execution starts as soon as it encounters a action (Similar to Spark)?
>
> 4) Can I create a global execution environment (Either local or
> distributed) for different Flink program in a module?
>
> 5) How to make the records come in sequence for a map or any other
> operator?
>
>
> Regards,
> Ravikumar
>
>
> On 8 June 2016 at 21:14, Till Rohrmann <trohrm...@apache.org> wrote:
>
>> Hi Ravikumar,
>>
>> Flink's operators are stateful. So you can simply create a variable in
>> your mapper to keep the state around. But every mapper instance will have
>> it's own state. This state is determined by the records which are sent to
>> this mapper instance. If you need a global state, then you have to set the
>> parallelism to 1.
>>
>> Cheers,
>> Till
>>
>> On Wed, Jun 8, 2016 at 5:08 PM, Ravikumar Hawaldar <
>> ravikumar.hawal...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I have an DataSet<UserDefinedType> which is roughly a record in a
>>> DataSet Or a file.
>>>
>>> Now I am using map transformation on this DataSet to compute a variable
>>> (coefficients of linear regression parameters and data structure used is a
>>> double[]).
>>>
>>> Now the issue is that, per record the variable will get updated and I am
>>> struggling to maintain state of this variable for the next record.
>>>
>>> In simple, for first record the variable values will be 0.0, and after
>>> first record the variable will get updated and I have to pass this updated
>>> variable for the second record and so on for all records in DataSet.
>>>
>>> Any suggestions on how to maintain state of a variable?
>>>
>>>
>>> Regards,
>>> Ravikumar
>>>
>>
>>
>

Reply via email to