Re: Dataset split/demultiplex

2016-05-13 Thread Gábor Gévay
I would like to add that if your predicate does some heavy-weight computation that you want to avoid duplicating for the filters, then you can insert a map before the filters, where you evaluate the predicate and put the result into a field. Best, Gabor 2016-05-13 11:51 GMT+02:00 Fabian Hueske

Re: Dataset split/demultiplex

2016-05-13 Thread Fabian Hueske
Hi, it is true that Gabor's approach of using two filters has a certain overhead. However, the overhead should be reasonable. The data stays on the same node and the filter can be very lightweight. I agree that this is not a very nice solution. However, modifying the DataSet API such that an oper

Re: Dataset split/demultiplex

2016-05-12 Thread CPC
Hi, if it just require implementing a custom operator(i mean does not require changes to network stack or other engine level changes) i can try to implement it since i am working on optimizer and plan generation for a month. Also we are going to implement our etl framework on flink and this kind

Re: Dataset split/demultiplex

2016-05-12 Thread Aljoscha Krettek
Hi, I agree that this would be very nice. Unfortunately Flink does only allow one output from an operation right now. Maybe we can extends this somehow in the future. Cheers, Aljoscha On Thu, 12 May 2016 at 17:27 CPC wrote: > Hi Gabor, > > Yes functionally this helps. But in this case i am proc

Re: Dataset split/demultiplex

2016-05-12 Thread CPC
Hi Gabor, Yes functionally this helps. But in this case i am processing an element twice and sending whole data to two different operator . What i am trying to achieve is like datastream split like functionality or a little bit more: In filter like scenario i want to do below pseudo operation:

Re: Dataset split/demultiplex

2016-05-12 Thread Gábor Gévay
Hello, You can split a DataSet into two DataSets with two filters: val xs: DataSet[A] = ... val split1: DataSet[A] = xs.filter(f1) val split2: DataSet[A] = xs.filter(f2) where f1 and f2 are true for those elements that should go into the first and second DataSets respectively. So far, the splits

Dataset split/demultiplex

2016-05-12 Thread CPC
Hi folks, Is there any way in dataset api to split Dataset[A] to Dataset[A] and Dataset[B] ? Use case belongs to a custom filter component that we want to implement. We will want to direct input elements whose result is false after we apply the predicate. Actually we want to direct input elements