You can also try and make the decision on the client. Imagine a program
like this.

long count = env.readFile(...).filter(...).count();

if (count > 5) {
  env.readFile(...).map().join(...).reduce(...);
}
else {
  env.readFile(...).filter().coGroup(...).map(...);
}



On Mon, Nov 2, 2015 at 1:35 AM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Giacomo,
>
> there is no direct support for use cases like yours. The main issue that
> it is not possible to modify the execution of a submitted program. Once it
> is running, it cannot be adapted. It is also not possible to inject a
> condition into the data flow logic, e.g., if this happens follow this flow
> branch, otherwise the other one.
>
> However, the following workaround might work for you:
> Once the condition to modify the running program becomes true, you can
> stop the running job by filtering out all records. This is the only way to
> gracefully quit a job (throwing an exception would also kill the job, but
> might not work well, if you still want to store some of the jobs results).
> The record filtering can be done by a filter function that read the filter
> condition from a broadcast variable.
> If the program finishes due to the condition, you can start a new program
> with the alternative data source.
>
> This is a bit hacky, but I don't see a different way to do it.
>
> Cheers, Fabian
>
> 2015-10-31 11:53 GMT+01:00 Giacomo Licari <giacomo.lic...@gmail.com>:
>
>> Hi Fabian,
>> thanks a lot for your solution.
>>
>> Just another question, do you think is possible to execute operations on
>> C dataset* , *inside filter or map operators (or any operator), when
>> some conditions appear, instead of waiting for the entire A dataset
>> processing?
>>
>> My purposes are:
>> If, while processing A dataset some conditions appear, stop executing
>> operations on A dataset and execute operations on C dataset.
>>
>> Some pseudocode from your solution:
>> DataSet<X> A = env.readFile(...);
>> DataSet<X> C = env.readFile(...);
>>
>> A.groupBy().reduce().filter(*Check conditions here and in case start
>> processing C*);
>>
>>
>> Thanks,
>> Giacomo
>>
>>
>>
>>
>> On Fri, Oct 30, 2015 at 11:02 PM, Fabian Hueske <fhue...@gmail.com>
>> wrote:
>>
>>> You refer to the DataSet (batch) API, right?
>>>
>>> In that case you can evaluate your condition in the program and fetch a
>>> DataSet back to the client using List<X> myData = DataSet<X>.collect();
>>> Based on the result of the collect() call you can define and execute a
>>> new program.
>>>
>>> Note: collect() will immediately trigger the execution of the program in
>>> its current state and bring the result back to the client. There is also a
>>> size limitation of results that can be fetched back. This is the Akka
>>> framesize which is 10MB by default but could be adapted.
>>>
>>> It would look similar to this:
>>>
>>> ExecutionEnvironment env = ...
>>>
>>> DataSet<X> a = env.readFile(...);
>>> List<Y> b = a.groupBy().reduce().filter().collect();
>>>
>>> DataSet<Z> c;
>>> if(b.get(0).equals(...)) {
>>>   c = env.readFile(someFile);
>>> } else {
>>>   c = env.readFile(someOtherFile);
>>> }
>>>
>>> c.map().groupBy().reduce()....writeAsFile(result);
>>>
>>> env.execute();
>>>
>>> Cheers, Fabian
>>>
>>> 2015-10-30 22:40 GMT+01:00 Giacomo Licari <giacomo.lic...@gmail.com>:
>>>
>>>> Hi guys,
>>>> I would ask to you how could I create triggers in Flink.
>>>>
>>>> I would like to perform some operations on a dataset and according to
>>>> some conditions, based on an attribute of a Pojo class or Tuple, execute
>>>> some triggers.
>>>> I mean, starting collecting other datasources' data and performing
>>>> operations over them.
>>>>
>>>> An Example.
>>>> I have a dataset of Pojo class Person. My trigger activation condition
>>>> is (number of italian people > 100).
>>>> If so, I collect another datasource and I execute operations over it.
>>>>
>>>> Do you think is that possible in Flink?
>>>>
>>>> Thanks,
>>>> Giacomo
>>>>
>>>
>>>
>>
>

Reply via email to