I agree with Stephan. Reading static files is quite uncommon with the DataStream API. Before We add such a method, we should add a convenience method for Kafka ;) But in general, I'm not a big fan of adding too many of these methods because they pull in so many external classes, which lead to breaking API changes, dependency issues etc.
I think such issues can be addressed easily with a good documentation (maybe in the "Best practices" guide), good answers on Stack Overflow and so on. On Wed, Nov 25, 2015 at 12:12 PM, Stephan Ewen <se...@apache.org> wrote: > For streaming, I am a bit torn whether reading a file will should have so > many such prominent functions. Most streaming programs work on message > queues, or on monitored directories. > > Not saying no, but not sure DataSet/DataStream parity is the main goal - > they are for different use cases after all... > > On Wed, Nov 25, 2015 at 8:22 AM, Chiwan Park <chiwanp...@apache.org> > wrote: > >> Thanks for correction @Fabian. :) >> >> > On Nov 25, 2015, at 4:40 AM, Suneel Marthi <smar...@apache.org> wrote: >> > >> > Guess, it makes sense to add readHadoopXXX() methods to >> StreamExecutionEnvironment (for feature parity with what's existing >> presently in ExecutionEnvironment). >> > >> > Also Flink-2949 addresses the need to add relevant syntactic sugar >> wrappers in DataSet api for the code snippet in Fabian's previous email. >> Its not cool, having to instantiate a JobConf in client code and having to >> pass that around. >> > >> > >> > >> > On Tue, Nov 24, 2015 at 2:26 PM, Fabian Hueske <fhue...@gmail.com> >> wrote: >> > Hi Nick, >> > >> > you can use Flink's HadoopInputFormat wrappers also for the DataStream >> API. However, DataStream does not offer as much "sugar" as DataSet because >> StreamEnvironment does not offer dedicated createHadoopInput or >> readHadoopFile methods. >> > >> > In DataStream Scala you can read from a Hadoop InputFormat >> (TextInputFormat in this case) as follows: >> > >> > val textData: DataStream[(LongWritable, Text)] = env.createInput( >> > new HadoopInputFormat[LongWritable, Text]( >> > new TextInputFormat, >> > classOf[LongWritable], >> > classOf[Text], >> > new JobConf() >> > )) >> > >> > The Java version is very similar. >> > >> > Note: Flink has wrappers for both MR APIs: mapred and mapreduce. >> > >> > Cheers, >> > Fabian >> > >> > 2015-11-24 19:36 GMT+01:00 Chiwan Park <chiwanp...@apache.org>: >> > I’m not streaming expert. AFAIK, the layer can be used with only >> DataSet. There are some streaming-specific features such as distributed >> snapshot in Flink. These need some supports of source and sink. So you have >> to implement I/O. >> > >> > > On Nov 25, 2015, at 3:22 AM, Nick Dimiduk <ndimi...@gmail.com> wrote: >> > > >> > > I completely missed this, thanks Chiwan. Can these be used with >> DataStreams as well as DataSets? >> > > >> > > On Tue, Nov 24, 2015 at 10:06 AM, Chiwan Park <chiwanp...@apache.org> >> wrote: >> > > Hi Nick, >> > > >> > > You can use Hadoop Input/Output Format without modification! Please >> check the documentation[1] in Flink homepage. >> > > >> > > [1] >> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html >> > > >> > > > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk <ndimi...@apache.org> >> wrote: >> > > > >> > > > Hello, >> > > > >> > > > Is it possible to use existing Hadoop Input and OutputFormats with >> Flink? There's a lot of existing code that conforms to these interfaces, >> seems a shame to have to re-implement it all. Perhaps some adapter shim..? >> > > > >> > > > Thanks, >> > > > Nick >> > > >> > > Regards, >> > > Chiwan Park >> > > >> > > >> > >> > Regards, >> > Chiwan Park >> > >> >> Regards, >> Chiwan Park >> >> >> >> >