For streaming, I am a bit torn whether reading a file will should have so many such prominent functions. Most streaming programs work on message queues, or on monitored directories.
Not saying no, but not sure DataSet/DataStream parity is the main goal - they are for different use cases after all... On Wed, Nov 25, 2015 at 8:22 AM, Chiwan Park <chiwanp...@apache.org> wrote: > Thanks for correction @Fabian. :) > > > On Nov 25, 2015, at 4:40 AM, Suneel Marthi <smar...@apache.org> wrote: > > > > Guess, it makes sense to add readHadoopXXX() methods to > StreamExecutionEnvironment (for feature parity with what's existing > presently in ExecutionEnvironment). > > > > Also Flink-2949 addresses the need to add relevant syntactic sugar > wrappers in DataSet api for the code snippet in Fabian's previous email. > Its not cool, having to instantiate a JobConf in client code and having to > pass that around. > > > > > > > > On Tue, Nov 24, 2015 at 2:26 PM, Fabian Hueske <fhue...@gmail.com> > wrote: > > Hi Nick, > > > > you can use Flink's HadoopInputFormat wrappers also for the DataStream > API. However, DataStream does not offer as much "sugar" as DataSet because > StreamEnvironment does not offer dedicated createHadoopInput or > readHadoopFile methods. > > > > In DataStream Scala you can read from a Hadoop InputFormat > (TextInputFormat in this case) as follows: > > > > val textData: DataStream[(LongWritable, Text)] = env.createInput( > > new HadoopInputFormat[LongWritable, Text]( > > new TextInputFormat, > > classOf[LongWritable], > > classOf[Text], > > new JobConf() > > )) > > > > The Java version is very similar. > > > > Note: Flink has wrappers for both MR APIs: mapred and mapreduce. > > > > Cheers, > > Fabian > > > > 2015-11-24 19:36 GMT+01:00 Chiwan Park <chiwanp...@apache.org>: > > I’m not streaming expert. AFAIK, the layer can be used with only > DataSet. There are some streaming-specific features such as distributed > snapshot in Flink. These need some supports of source and sink. So you have > to implement I/O. > > > > > On Nov 25, 2015, at 3:22 AM, Nick Dimiduk <ndimi...@gmail.com> wrote: > > > > > > I completely missed this, thanks Chiwan. Can these be used with > DataStreams as well as DataSets? > > > > > > On Tue, Nov 24, 2015 at 10:06 AM, Chiwan Park <chiwanp...@apache.org> > wrote: > > > Hi Nick, > > > > > > You can use Hadoop Input/Output Format without modification! Please > check the documentation[1] in Flink homepage. > > > > > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html > > > > > > > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk <ndimi...@apache.org> > wrote: > > > > > > > > Hello, > > > > > > > > Is it possible to use existing Hadoop Input and OutputFormats with > Flink? There's a lot of existing code that conforms to these interfaces, > seems a shame to have to re-implement it all. Perhaps some adapter shim..? > > > > > > > > Thanks, > > > > Nick > > > > > > Regards, > > > Chiwan Park > > > > > > > > > > Regards, > > Chiwan Park > > > > Regards, > Chiwan Park > > > >