Unfortunately, it's not possible to bridge the gap between the DataSet and DataStream APIs.
However, you can also use a CsvInputFormat in the DataStream API. Since there's no built-in API to configure the CSV input, you would have to create (and configure) the CsvInputFormat yourself. Once you have the CsvInputFormat, you can create a DataStream using StreamExecutionEnvironment.readFile(csvIF). Hope this helps, Fabian 2017-10-17 11:05 GMT+02:00 Magnus Vojbacke <magnus.vojba...@gmail.com>: > Thank you, Fabian! If batch semantics are not important to my use case, is > there any way to "downgrade" or convert a DataSet to a DataStream? > > BR > /Magnus > > On 17 Oct 2017, at 10:54, Fabian Hueske <fhue...@gmail.com> wrote: > > Hi Magnus, > > there is no Split operator on the DataSet API. > > As you said, this can be done using a FilterFunction. This also allows for > non-binary splits: > > DataSet<X> setToSplit = ... > DataSet<X> firstSplit = setToSplit.filter(new SplitCondition1()); > DataSet<X> secondSplit = setToSplit.filter(new SplitCondition2()); > DataSet<X> thirdSplit = setToSplit.filter(new SplitCondition3()); > > where SplitCondition1, SplitCondition2, and SplitCondition3 are > FilterFunction that filter out all records that don't belong to the split. > > Best, Fabian > > 2017-10-17 10:42 GMT+02:00 Magnus Vojbacke <magnus.vojba...@gmail.com>: > >> I'm looking for something like DataStream.split(), but for DataSets. I'd >> like to split my streaming data so messages go to different parts of an >> execution graph, based on arbitrary logic. >> >> DataStream.split() seems to be perfect, except that my source is a CSV >> file, and I have only found built in functions for reading CSV files into a >> DataSet. >> >> I've evaluated using DataSet.filter(), but as far as I can tell, that >> only allows me to emulate a yes/no split. This is not ideal because it's >> too coarse, and I would prefer a more fine grained split than that. >> >> >> Do you have any suggestions on how I can achieve my arbitrary splitting >> logic for a) DataSets in general, or b) CSV files? >> >> > >