You can also try and make the decision on the client. Imagine a program like this.
long count = env.readFile(...).filter(...).count(); if (count > 5) { env.readFile(...).map().join(...).reduce(...); } else { env.readFile(...).filter().coGroup(...).map(...); } On Mon, Nov 2, 2015 at 1:35 AM, Fabian Hueske <fhue...@gmail.com> wrote: > Hi Giacomo, > > there is no direct support for use cases like yours. The main issue that > it is not possible to modify the execution of a submitted program. Once it > is running, it cannot be adapted. It is also not possible to inject a > condition into the data flow logic, e.g., if this happens follow this flow > branch, otherwise the other one. > > However, the following workaround might work for you: > Once the condition to modify the running program becomes true, you can > stop the running job by filtering out all records. This is the only way to > gracefully quit a job (throwing an exception would also kill the job, but > might not work well, if you still want to store some of the jobs results). > The record filtering can be done by a filter function that read the filter > condition from a broadcast variable. > If the program finishes due to the condition, you can start a new program > with the alternative data source. > > This is a bit hacky, but I don't see a different way to do it. > > Cheers, Fabian > > 2015-10-31 11:53 GMT+01:00 Giacomo Licari <giacomo.lic...@gmail.com>: > >> Hi Fabian, >> thanks a lot for your solution. >> >> Just another question, do you think is possible to execute operations on >> C dataset* , *inside filter or map operators (or any operator), when >> some conditions appear, instead of waiting for the entire A dataset >> processing? >> >> My purposes are: >> If, while processing A dataset some conditions appear, stop executing >> operations on A dataset and execute operations on C dataset. >> >> Some pseudocode from your solution: >> DataSet<X> A = env.readFile(...); >> DataSet<X> C = env.readFile(...); >> >> A.groupBy().reduce().filter(*Check conditions here and in case start >> processing C*); >> >> >> Thanks, >> Giacomo >> >> >> >> >> On Fri, Oct 30, 2015 at 11:02 PM, Fabian Hueske <fhue...@gmail.com> >> wrote: >> >>> You refer to the DataSet (batch) API, right? >>> >>> In that case you can evaluate your condition in the program and fetch a >>> DataSet back to the client using List<X> myData = DataSet<X>.collect(); >>> Based on the result of the collect() call you can define and execute a >>> new program. >>> >>> Note: collect() will immediately trigger the execution of the program in >>> its current state and bring the result back to the client. There is also a >>> size limitation of results that can be fetched back. This is the Akka >>> framesize which is 10MB by default but could be adapted. >>> >>> It would look similar to this: >>> >>> ExecutionEnvironment env = ... >>> >>> DataSet<X> a = env.readFile(...); >>> List<Y> b = a.groupBy().reduce().filter().collect(); >>> >>> DataSet<Z> c; >>> if(b.get(0).equals(...)) { >>> c = env.readFile(someFile); >>> } else { >>> c = env.readFile(someOtherFile); >>> } >>> >>> c.map().groupBy().reduce()....writeAsFile(result); >>> >>> env.execute(); >>> >>> Cheers, Fabian >>> >>> 2015-10-30 22:40 GMT+01:00 Giacomo Licari <giacomo.lic...@gmail.com>: >>> >>>> Hi guys, >>>> I would ask to you how could I create triggers in Flink. >>>> >>>> I would like to perform some operations on a dataset and according to >>>> some conditions, based on an attribute of a Pojo class or Tuple, execute >>>> some triggers. >>>> I mean, starting collecting other datasources' data and performing >>>> operations over them. >>>> >>>> An Example. >>>> I have a dataset of Pojo class Person. My trigger activation condition >>>> is (number of italian people > 100). >>>> If so, I collect another datasource and I execute operations over it. >>>> >>>> Do you think is that possible in Flink? >>>> >>>> Thanks, >>>> Giacomo >>>> >>> >>> >> >