Thank you, Fabian! If batch semantics are not important to my use case, is 
there any way to "downgrade" or convert a DataSet to a DataStream?

BR
/Magnus

> On 17 Oct 2017, at 10:54, Fabian Hueske <fhue...@gmail.com> wrote:
> 
> Hi Magnus,
> 
> there is no Split operator on the DataSet API.
> 
> As you said, this can be done using a FilterFunction. This also allows for 
> non-binary splits:
> 
> DataSet<X> setToSplit = ...
> DataSet<X> firstSplit = setToSplit.filter(new SplitCondition1());
> DataSet<X> secondSplit = setToSplit.filter(new SplitCondition2());
> DataSet<X> thirdSplit = setToSplit.filter(new SplitCondition3());
> 
> where SplitCondition1, SplitCondition2, and SplitCondition3 are 
> FilterFunction that filter out all records that don't belong to the split.
> 
> Best, Fabian
> 
> 2017-10-17 10:42 GMT+02:00 Magnus Vojbacke <magnus.vojba...@gmail.com 
> <mailto:magnus.vojba...@gmail.com>>:
> I'm looking for something like DataStream.split(), but for DataSets. I'd like 
> to split my streaming data so messages go to different parts of an execution 
> graph, based on arbitrary logic.
> 
> DataStream.split() seems to be perfect, except that my source is a CSV file, 
> and I have only found built in functions for reading CSV files into a DataSet.
> 
> I've evaluated using DataSet.filter(), but as far as I can tell, that only 
> allows me to emulate a yes/no split. This is not ideal because it's too 
> coarse, and I would prefer a more fine grained split than that.
> 
> 
> Do you have any suggestions on how I can achieve my arbitrary splitting logic 
> for a) DataSets in general, or b) CSV files?
> 
> 

Reply via email to